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Summary. A recent theoretical result on optimized Schwarz algorithms 
demonstrated at the algebraic level enables the modification of an existing 
Schwarz procedure to its optimized counterpart. In this work, it is shown how 
to modify a bilinear FEM based Schwarz preconditioning strategy originally 
presented in [Fis97] to its optimized version. The latter is employed to pre- 
condition the pseudo-Laplacian operator arising from the spectral element 
discretization of the magnetohydrodynamic equations in Elsasser form. 



1 Introduction 

This work concerns the preconditioning of a pseudo-Laplacian operator^ as- 
sociated with the saddle point problem arising at each time-step in a spectral 
element based adaptive MHD solver. The approach proposed herein is a mod- 
ification of the method developed in [Fis97] where an overlapping Schwarz 
preconditioner was constructed using a low oder discretization. The latter 
approach is based on the spectral equivalence between finite-elements and 
spectral elements [CHQZ07, Kim06]. The finite-element blocks, representing 
the additive Schwarz, are replaced by so called optimized Schwarz blocks 
[SCGTOTa]. Two types of meshes, employed to construct the Qi block pre- 
conditioning are investigated. The first one is cross shaped and shows good 
behavior for additive Schwarz (AS) and restricted additive Schwarz (RAS). 
However improved convergence rates of the optimized RAS (ORAS) version 
are completely dominated by the corner effects [CNN06]. Opting for a sec- 
ond grid that includes the corners seems to correct this issue. For the zeroth 
order optimized transmission condition (OO) an exact tensor product form 
is available while for the 02 version a slight error is introduced in order to 



^A.k.a: consistent Laplacian or approximate pressure Schur complement. 
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preserve the properties of the operators and enable the use of fast diagonaliza- 
tion techniques introduced for spectral elements in [Cou95]. It is shown how 
to modify trivially an existing fast-diagonalization procedure and numerical 
experiments demonstrate the efficiency of the modification. 

2 Governing equations and discretization 

For an incompressible fluid with constant mass density po, the magnetohy- 
drodynamic (MHD) equations are: 

dtu + u-Vu = -Vp + bfj xh + pV'^u, (1) 

a^b = V X (u X b) + ryV^b (2) 

V • u = 0, V • b = (3) 

where u and b are the velocity and magnetic field (in Alfven velocity units, 
b = B/y'/uo/Oo with B the induction and /xo the permeability); p is the pressure 
divided by the mass density, and p and rj are the kinematic viscosity and the 
magnetic resistivity. In Elsasser form, the equations are [Els50]: 

dtZ^ + Z=F • VZ=^ + Vp - i/=^V^Z± - z/^FV^Z^F = (4) 

V • Z± = , (5) 

with = u± b and = ^{u ±r]). The velocity u and magnetic field 
b can be recovered by expressing them in terms of Z^. Since all time-scales 
are of interest, an explicit second order Runge-Kutta scheme is applied to 
discrctizc the time-derivative of the above system while, for the spatial part, a 
Pjv — Pjv-2 spectral element formulation was chosen to prevent the excitation 
of spurious pressure modes using the spaces 

:=|w = X;^^iW^''e'' I eHi(£))VM&w = 7ona£)| (6) 

Hi(D):={/ \feL2{D)kd,.f&'L2{D)yn}, (7) 

with w = u, b, Z^, p and their test functions, and q restricted to finite- 
dimensional subspaccs of these spaces: 

Z± e = Uzo n Pjv, e = Uo f) Pjv, 

p,ge Y^-2=L2(£') fl Pjv-2 

see for instance [MPR92, Fis97]. The basis for the velocity expansion in P]v is 

the set of Lagrange interpolating polynomials on the Gauss-Lobatto-Lcgcndre 
(GL) quadrature nodes, and the basis for the pressure is the set of Lagrange 
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interpolants on the Gauss-Legendre (G) quadrature nodes. In the spectral 
element method formalism, the domain, D, is composed of a union of non- 
overlapping subdomains, or elements, E^j: D = [J^^i E/j, and functions in 
and Y^~^ are represented as expansions in terms of tensor products of basis 
functions within each subdomain. The complete discretization at each stage 
is: 

Zf = Zf" - \At M-\UC^Z^ - Djp± + v±\.Z^ + u^LZj). (8) 

rC 

We require that each Runge-Kutta stage obeys (5) in its discrete form, so 

multiplying (8) by D,, summing, and setting the term Z- = leads to the 
following pseudo-Poisson equation for the pressures, p : 

D^-^Djp^ = D^gf, (9) 

where the quantity 

gf = ^At M-\MC^zf + v±LZ^ + v^Lz]) - 

is the remaining inhomogeneous contribution (see [RPM07]). More details on 
the various operators can be found in [DFM02]. Equation (9) is solved using 
a preconditioned iterative Krylov method. 



3 Prom classical to optimized Schwarz 

The principle behind optimized Schwarz methods consists into replacing the 
Dirichlet transmission condition present in the classical Schwarz approach by 
a more general boundary condition. This idea was first analyzed by Lions in 
[Lio88] where a Robin condition was introduced. The latter contained a posi- 
tive parameter that could possibly be used to enhance convergence. However, 
until recently, it was not clear how to define optimally that parameter for 
the new conditions at the interfaces between subdomains. Optimized Schwarz 
methods are derived from a Fourier analysis of the continuous elliptic partial 
differential equation, see for instance [Gan06] and references therein. Starting 
from the problem stated at the continuous level, suppose that a linear elliptic 
operator C with forcing / and boundary conditions V needs to be solved on 
D. An algorithm that can be employed to solve the global problem £u = / is 

/ inEi 

g on^DnEi (10) 
u" on Fij 

where the sequence with respect to n will be convergent for any initial guess 
vP with 

Fij = d% n Ej 7^ {0}. 
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This is none other than 
the classical Schwarz al- 
gorithm at the continuous 
level corresponding to RAS 
at the matrix level. In Fig. 
1, f}^ represents the quad- 
rangulation of the over- 
lapping domain E^. The 
optimized version of the 
above algorithm replaces 
the transmission conditions 
between subdomains by 

= B.jul, on r,, 

where Bij makes each sub- 
problem well posed and can be a function of optimizable parameters: the algo- 
rithm, like in the classical case, converges to the solution of Lu with V{u) = g 
on D. The discrete algebraic version is 

with Cp^, Cpp and C corresponding to the the discrete expressions of the new 
transmission conditions. At this point notice that A^^ is exactly the same block 
as in the original Schwarz algorithm. A simple (block) Gaussian elimination 
leads to the following preconditioned system 

{/ - X] {R''fBkjR^}u = J^iR'fi^ir'R'f (11) 

j,k=l fe=l 

with A^^ = A'l;^ - A'yp{Cpp)~'^Cp-. Thus, the optimized restricted additive 

Schwarz preconditioning is expressed as Pqras — J2k=i(-^'')^ ■ 
The above results are completely algebraic and independent of the under- 
lying space discretization method. The complete proof in the additive and 
multiplicative case with and without overlap can be found in [SCGT07a]. In 
the case of two subdomains it can be shown that the optimal transmission 
operator is the Schur complement [SCGT07b]. Also in weighted residual tech- 
niques the artificial boundary conditions should be properly weighted. 

4 Qi formulation and optimized Schwarz 

The Qi formulation for the Schwarz algorithm can be found in (cite). We 
merely express here the changes necessary in order to obtain an optimized 
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Fig. 1. One overlapping element 
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version. First, the overlap depicted in Fig. 1 is the minimal one. Secondly, 

the normal or tangential derivatives expressions at the boundaries must not 
involve more than 2 points. Including more would destroy the optimized iter- 
ates as mentioned in [SCGTOTa] (remark 1). Both requirements are satisfied 
by the Qi FEM formulation naturally. 

It is important to handle the assembly of the Ql operators correctly at 
the endpoints. Thus we build our reference interval on the extended grid 
to include one additional node at each end of the interval. We then begin 
the assembly (direct stiffness summation) starting at the second node on the 
left, and continuing until the second- to-last node on the right. In this way, 
the negative-sloping linear FEM shape function on the left most subinterval 
and the positive-sloping shape function on the rightmost subinterval are not 
included in the assembly. The general idea is illustrated in figure 2. 



subinterval 




• Gauss grid node | Added node 

Extended Gauss grid node | Gauss-Lobatto node end-point 



Fig. 2. Schematic of Ql assembly. The left- and right-most dashed shape functions 
are not included in the assembly. 



We can define for the linear problem in equation (10) a general transmis- 
sion condition [SCGT07a] between each element 



dn 



T{uj,p, q, t) 



n+l 



duj 
dn 



-T{uj,p,q,T) 



where the interface blocks T{uj,p,q,T) = puj — q-g^, define a transmission 
condition of order 2 with two parameters, p and q, specified in [Gan06], and 
also provided for completeness in table 1. In general, p and q are different 
depending on whether there is overlap, but here we consider only the case of 
finite overlap. 







q 


OOO, overlap Ch 
002, overlap Ch 


2-''/"[kl,^ + r!f'\Ch)-^'" 




2-'/"ikl,,, + ri)-'/''{Chf/' 



Table 1. Choices for the parameters p and q used in the interface blocks. OOj 
stands for optimized of order j. 
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4.1 FDM 

When rectangular elements are considered a fast diagonalization method 
(FDM) (e.g., [Cou95]) can be used to invert the optimized Qi blocks. The 
number of operations required to invert TV'' x A'^'' matrix using such a tech- 
nique is 0{N'^^-^ ) and the apphcation of the inverse is performed using efhcient 
tensor products in 0{N'^) operations. We propose the form 

Au = {M + Toq) ^{K + Top) + {K + T^p) ® (M + Tog) (12) 

as the optimized block matrix where Tq is a matrix almost completely filled 
with zeros with the exception of the entries (1,1) and {N, N) which are set 
to 1. This form implements the weak form of the transmission conditions, 
Equation (12), directly in the optimized block. An. Notice that in order for 
the fast diagonalization technique to be applicable, the coefficients p and q 
must be constant on their face. The modified matrix M+Toq is still symmetric 
and positive definite while the matrix K + ToP is still symmetric. This enables 
the use of the modified mass matrix in an inner product and the simultaneous 
diagonalization of both tensors. When <? = the proposed formula is exact; 
however, when g 7^ a slight error is made at the corners. If the quantity 
(Top) ® {T^q) was removed then the expression in the 02 would also be exact. 

5 Numerical experiments 

We have implemented the RAS preconditioner described above in the MHD 
code. This version allows for variable overlap of the extended grid. The ORAS 
counterpart has also been implemented as described by starting from the RAS, 
and for comparison, we can use a high-order block Jacobi (B J) method as well. 
We consider first tests of a single pseudo-Poisson solve. We use a periodic 
grid of = 8 X 8 elements, and iterate using BiCGStab until the residual is 
10~* times that of the initial residual. The extended grid overlap is 2, and the 
initial starting guess for the Krylov method is composed of random noise. The 
first test, uses non-FDM preconditioners to investigate the effect of including 
corner transfers on the optimization. The results are presented in Fig. 3, in 
which we consider only the OO optimization. Note that even thought the RAS 
is much less sensitive to the corner communication, especially at higher A'^,,, 
the 00 with corners requires fewer iterations. Clearly, corner communication 
is crucial to the proper functioning of the optimized methods. 

In the next experiment, all the parameters are maintained except we use 
a grid of i? = 16 x 16 elements together with the FDM version of the pre- 
conditioners to investigate performance. These results are presented on the 
right-most figure of Fig. 3. 
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Fig. 3. left: Plot of iteration count vs GLL-expansion node number for different 
preconditioners with and without corner communication on an 8 x 8 element grid. 
right: Comparison of of CPU time vs. GLL-expansion node number of FDM-based 
preconditioners with corner communication on a 16 x 16 element grid. 

6 Conclusions and future directions 

It is shown that a simple modification of a RAS in a low order FDM based 
preconditioning of the the pseudo-Laplacian operator can reduce the time 
to solution by up to a factor of two for high order GLL expansions. Also, 
as expected from the work of [CNN06], we find that the cross form of the 
subdomains is not suitable as is for the optimized version of the algorithm: 
corners need to be included. Upcoming work will concern the inclusion of a 
coarse solver in this approach and a treatment for non-conforming elements. 
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